AITopics | right image

Collaborating Authors

right image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Overleaf Example

Neural Information Processing SystemsJun-15-2026, 12:55:53 GMT

This section outlines the design and evaluation of distractor choices in our VQA dataset, which play a critical role in determining question difficulty and diagnostic value. We begin by examining the impact of introducing a "None of the Above" (NAB%) option, which systematically increases task ambiguity and reduces model performance across the board (Figure 1). We then detail the principles and heuristics used to generate diverse and context-aware distractors for different question types. These include binary negations, categorical sampling, spatial reasoning perturbations, and contentaware language distractors. Special emphasis is placed on generating plausible incorrect choices that reflect partial knowledge, ambiguity, or visually confusable elements. Finally, we describe how randomized shuffling and probabilistic replacement with NAB options further strengthen the challenge by discouraging rote pattern matching. Together, these strategies enhance the dataset's ability to probe fine-grained reasoning, visual grounding, and robustness to uncertainty in large vision-language models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

6794f555524c9069e26970a408d353cc-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 12:25:01 GMT

artificial intelligence, epe bad 1, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies

Wang, Xu, Xu, Jialang, Zhang, Shuai, Huang, Baoru, Stoyanov, Danail, Mazomenos, Evangelos B.

arXiv.org Artificial IntelligenceApr-25-2025

StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies Xu Wang, Jialang Xu, Shuai Zhang, Baoru Huang, Danail Stoyanov, and Evangelos B. Mazomenos Abstract -- Stereo disparity estimation is crucial for obtaining depth information in robot-assisted minimally invasive surgery (RAMIS). While current deep learning methods have made significant advancements, challenges remain in achieving an optimal balance between accuracy, robustness, and inference speed. T o address these challenges, we propose the Stereo-Mamba architecture, which is specifically designed for stereo disparity estimation in RAMIS. Our approach is based on a novel Feature Extraction Mamba (FE-Mamba) module, which enhances long-range spatial dependencies both within and across stereo images. T o effectively integrate multi-scale features from FE-Mamba, we then introduce a novel Multidimensional Feature Fusion (MFF) module. Experiments against the state-of-the-art on the ex-vivo SCARED benchmark demonstrate that StereoMamba achieves superior performance on EPE of 2.64 px and depth MAE of 2.55 mm, the second-best performance on Bad2 of 41.49% and Bad3 of 26.99%, while maintaining an inference speed of 21.28 FPS for a pair of high-resolution images (1280 1024), striking the optimum balance between accuracy, robustness, and efficiency. Furthermore, by comparing synthesized right images, generated from warping left images using the generated disparity maps, with the actual right image, StereoMamba achieves the best average SSIM (0.8970) and PSNR (16.0761), exhibiting strong zero-shot generalization on the in-vivo RIS2017 and StereoMIS datasets. I. INTRODUCTION Stereo endoscopes are routinely employed in robotic-assisted minimally invasive surgery (RAMIS) to visualize the internal anatomy, providing surgeons with depth perception for precise instrument manipulation [1].

artificial intelligence, disparity estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.17401

Genre: Research Report (0.64)

Industry: Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Jiang, Ruixiang, Chen, Changwen

arXiv.org Artificial IntelligenceJan-15-2025

We present the first study on how Multimodal LLMs' (MLLMs) reasoning ability shall be elicited to evaluate the aesthetics of artworks. To facilitate this investigation, we construct MM-StyleBench, a novel high-quality dataset for benchmarking artistic stylization. We then develop a principled method for human preference modeling and perform a systematic correlation analysis between MLLMs' responses and human preference. Our experiments reveal an inherent hallucination issue of MLLMs in art evaluation, associated with response subjectivity. ArtCoT is proposed, demonstrating that art-specific task decomposition and the use of concrete language boost MLLMs' reasoning ability for aesthetics. Our findings offer valuable insights into MLLMs for art and can benefit a wide range of downstream applications, such as style transfer and artistic image generation. Code available at https://github.com/songrise/MLLM4Art.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.09012

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Wang, Lezhong, Frisvad, Jeppe Revall, Jensen, Mark Bo, Bigdeli, Siavash Arjomand

arXiv.org Artificial IntelligenceJun-2-2024

The demand for stereo images increases as manufacturers launch more XR devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is trainning free, remarkably straightforward to use, and it seamlessly integrates into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layers Modification methods to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.

disparity map, lpip, ssim, (12 more...)

arXiv.org Artificial Intelligence

2403.04965

Country:

Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DynPL-SVO: A Robust Stereo Visual Odometry for Dynamic Scenes

Zhang, Baosheng, Ma, Xiaoguang, Ma, Hongjun, Luo, Chunbo

arXiv.org Artificial IntelligenceJun-24-2023

Most feature-based stereo visual odometry (SVO) approaches estimate the motion of mobile robots by matching and tracking point features along a sequence of stereo images. However, in dynamic scenes mainly comprising moving pedestrians, vehicles, etc., there are insufficient robust static point features to enable accurate motion estimation, causing failures when reconstructing robotic motion. In this paper, we proposed DynPL-SVO, a complete dynamic SVO method that integrated united cost functions containing information between matched point features and re-projection errors perpendicular and parallel to the direction of the line features. Additionally, we introduced a \textit{dynamic} \textit{grid} algorithm to enhance its performance in dynamic scenes. The stereo camera motion was estimated through Levenberg-Marquard minimization of the re-projection errors of both point and line features. Comprehensive experimental results on KITTI and EuRoC MAV datasets showed that accuracy of the DynPL-SVO was improved by over 20\% on average compared to other state-of-the-art SVO systems, especially in dynamic scenes.

artificial intelligence, line feature, point feature, (16 more...)

arXiv.org Artificial Intelligence

2205.08207

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Industry: Media (0.74)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

DeepGlobe Road Extraction -- Challenge

#artificialintelligenceMar-1-2022, 11:05:14 GMT

The Geoscience and Remote Sensing Society -- one of the well-known communities to learn and contribute to Geospatial Science has sponsored the DeepGlobe machine vision challenge in 2018, which includes the deep analysis of satellite images of Earth. As part of this, I picked up the problem of Road Extraction as roads have always been a crucial part in various aspects be it transportation, traffic management, city planning, road monitoring, GPS navigation, etc. The challenges of DeepGlobe are purely research-based and focus on the real problems. This is something we need to predict. The one caveat here is that we need to have an equal number of classes to consider this metric.

data prediction, prediction, prediction mask, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Distance Estimation

#artificialintelligenceNov-17-2021, 09:30:19 GMT

It is not possible to estimate the distance (depth) of a point object'P' from the camera using a single camera'O'. This is because'P' lying anywhere on the projective line will map to point'p' in the image. Stereo vision is a technique that can estimate the distance (depth) of a point object'P' from the camera using two cameras. The foundation of stereo vision is similar to 3D perception in human vision and is based on the triangulation of rays from multiple viewpoints. In this tutorial, we'll be using the Parallel stereo camera system for depth estimation.

distance estimation, right image, similarity, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.81)

Add feedback

Data Augmentation Compilation with Python and OpenCV

#artificialintelligenceAug-20-2021, 04:40:25 GMT

Data augmentation is a technique to increase the diversity of dataset without an effort to collect any more real data but still help improve your model accuracy and prevent the model from overfitting. In this post, you will learn to implement the most popular and efficient data augmentation procedures for object detection task using Python and OpenCV. Firstly, let's import several libraries and prepare some necessary subroutines before going ahead. The below image is used as a sample image during this post. Random Crop selects randomly a region and crops it out to make a new data sample, the cropped region should have the same width/height ratio as the original image to maintain the shapes of objects.

data augmentation, noise, python and opencv, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Object Disparity

Wang, Ynjiun Paul

arXiv.org Artificial IntelligenceAug-17-2021

Most of stereo vision works are focusing on computing the dense pixel disparity of a given pair of left and right images. A camera pair usually required lens undistortion and stereo calibration to provide an undistorted epipolar line calibrated image pair for accurate dense pixel disparity computation. Due to noise, object occlusion, repetitive or lack of texture and limitation of matching algorithms, the pixel disparity accuracy usually suffers the most at those object boundary areas. Although statistically the total number of pixel disparity errors might be low (under 2% according to the Kitti Vision Benchmark of current top ranking algorithms), the percentage of these disparity errors at object boundaries are very high. This renders the subsequence 3D object distance detection with much lower accuracy than desired. This paper proposed a different approach for solving a 3D object distance detection by detecting object disparity directly without going through a dense pixel disparity computation. An example squeezenet Object Disparity-SSD (OD-SSD) was constructed to demonstrate an efficient object disparity detection with comparable accuracy compared with Kitti dataset pixel disparity ground truth. Further training and testing results with mixed image dataset captured by several different stereo systems may suggest that an OD-SSD might be agnostic to stereo system parameters such as a baseline, FOV, lens distortion, even left/right camera epipolar line misalignment.

dataset, disparity, stereo image, (16 more...)

arXiv.org Artificial Intelligence

2108.07939

Country: North America > United States > California > Santa Clara County > Cupertino (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback